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Abstract 


•The  paper  discusses  the  potential  benefits  of  integrating 
technology,  cognitive  science,  and  psychometric  theory.  It  is  argued 
that  even  though  adaptive  testing,  as  currently  implemented,  is  an 
important  achievement,  it  will  be  necessary  to  pay  close  attention  to 
the  psychological  foundation  of  tests  to  continue  advancing  the  state  of 
the  art.  Such  an  effort  requires  construct  validation  in  the  broadest 
sense,  as  well  as  focusing  on  items  and  why  they  differ  with  respect  to 
psychometric  parameters,  specially  difficulty.  This  approach  opens  the 
possibility  of  generating  items  with  better  control  of  their  psycho¬ 
metric  characteristics  and  ultimately  the  development  of  computer- 
based  tests  that  are  solidly  anchored  in  psychological  theory,, 
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Speculations  on  the  Future  of  Test  Design 

Isaac  I.  Bejar 

Introduction 

I  am  grateful  for  the  opportunity  to  write  the  final  chapter  for  a 
book  concerned  with  the  improvement  of  test  design.  I  do  not  envy  for 
one  moment  the  task  of  the  contributors  to  this  volume,  for  theirs  is  a 
difficult  responsibility.  By  contrast,  my  task  is  to  speculate  on  the 
future  of  test  design,  not  so  difficult  a  task  when,  as  in  this  case, 
the  contributors  have  provided  such  stimulating  descriptions  of  their 
research  programs. 

The  chapter  is  divided  in  two  major  sections.  In  the  first  section 
I  identify  three  areas  of  test  design  that  are  bound  to  be  significantly 
influenced  by  the  increasing  availability  of  technology.  These  three 
areas  are  computer-assisted  test  assembly,  computer-assisted  test 
administration,  and  computer-assisted  test  generation.  All  three  will 
be  significantly  affected  by  the  sheer  presence  of  technology  and  thus 
there  is  the  danger  that  they  may  be  affected  only  in  superficial  ways. 
Contributions  such  as  the  ones  presented  in  this  volume  will  be  largely 
responsible  for  effecting  the  hoped-for  fundamental  change.  The  second 
section  argues  that  a  fundamental  change  is  more  likely  to  come  about  by 
an  integration  of  cognitive  psychology  and  psychometric  theory. 

Technology  and  Test  Design 

Future  test  designers  will  have  at  their  disposal  the  ever-growing 
fruits  of  the  information  revolution.  The  evidence  for  this  revolution 


is  everywhere,  but  most  significantly  it  is  evidenced  by  the  increasing 
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presence  of  microcomputers  at  school,  at  home,  and  at  work.  For  test 
designers,  the  increasing  availability  of  technology  is  a  mixed 
blessing.  Although  such  growth  creates  the  opportunity  to  develop 
better  tests  or  administer  them  more  efficiently,  it  also  creates  a 
pressure  to  computerize  tests  and  use  technology  superficially.  Three 
areas  of  test  design  that  are  vulnerable  to  these  pressures  are 

1.  Administration  of  tests  by  computers, 

2.  Computer-assisted  test  assembly, 

3.  The  generation  of  items  by  computer. 

Administration  of  Tests 

The  administration  of  tests  by  computers  is  no  longer  just  a 
possibility,  it  is  a  reality.  Moreover,  it  stands  as  one  of  the 
proudest  achievements  of  psychometrics  because  the  theory  that  would 
make  adaptive  testing  a  reality,  Item  Response  Theory  (IRT;  Lord,  19BU), 
was  developed  before  computers  were  widely  available.  Had  this  theory 
not  been  developed  it  is  likely  that  in  the  current  technological 
revolution  computers  would  have  been  applied  to  testing  in  a  shallow 
manner.  That  is,  computers  probably  would  have  used  as  automated  answer 
sheets  rather  than  as  a  means  of  delivering  new  kinds  of  tests  or  more 
efficient  tests. 

By  the  early  1970s  computer  technology  had  reached  the  point  where 
it  was  possible  simultaneously  to  test  several  examinees  more  or  less 
economically.  The  pioneering  efforts  of  Weiss  (1974)  capitalized  on 
this  event  and  on  the  availability  of  IRT  to  begin  an  extensive  research 
program  on  the  psychometric  and  practical  issues  of  adaptive  testing. 
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In  an  adaptive  test,  the  computer's  job  is  not  merely  to  present  the 
item  and  score  it  but  also  to  determine  which  item  should  be 
administered  next,  given  the  student's  current  level  of  performance. 
Although  adaptive  tests  usually  use  multiple-choice  items  and  thus  give 
the  impression  that  a  paper-and-penci 1  test  has  been  transferred  to  a 
computer,  in  reality  different  examinees  are  responding  to  different 
tests  assembled  by  the  computer  for  each  examinee  so  that  the  resulting 
score  may  be  most  precise  for  an  individual  test  taker. 

It  is  tempting  to  say  that  adaptive  testing  became  possible  as  a 
result  of  coupling  computers  and  IRT.  The  fact  is  that  Binet  was  doing 
pretty  much  the  same  thing  at  the  turn  of  this  century.  Of  course,  then 
it  was  the  psychometrician,  not  the  computer,  that  was  selecting  and 
scoring  the  items.  Adaptive  testing  is  thus  an  efficient  implementation 
of  a  long-standing  idea.  Nevertheless,  it  is  still  a  significant 
achievement,  especially  considering  what  would  have  happened  in  the 
absence  of  IRT — namely,  the  blind  transfer  of  items  to  a  computer 
screen.  That  achievement  is  about  to  become  a  practical  reality.  The 
military  and  private  testing  organizations  have  both  been  seriously 
contemplating  the  practical  implementation  of  adaptive  testing  systems. 
In  some  cases  concrete  steps  have  already  been  taken  toward  their 
implementation.  Although  it  is  too  early  to  tell  what  success  these 
initial  efforts  will  encounter,  computers  are  becoming  so  pervasive  that 
not  to  give  a  test  by  computer  may  soon  appear  archaic.  Chances  are 
that  there  will  thus  be  more  computer  administration  of  tests,  although 
not  necessarily  because  they  are  better  psychometrically.  It  will 
therefore  be  up  to  the  test  designer  to  make  the  best  possible  use  of 
the  available  technology. 
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While  adaptive  testing  has  been  moving  forward,  technology, 
psychometrics  and  substantive  theory  have  not  remained  static,  and  the 
integration  of  these  three  opens  up  additional  opportunities.  For 
example,  most  adaptive  testing  research  has  been  limited  to  verbal 
items.  This  was  so  because  until  recently  it  was  too  expensive  to 
display  symbols  and  graphics  on  a  CRT  (cathode  ray  tube).  That  has 
changed  and  in  principle  test  material  can  even  be  presented  in  the  form 
of  television  images  by  means  of  videodisc  players.  A  videodisc  permits 
access  to  up  to  54,000  television  frames  and,  for  example,  language 
skills  could  be  tested  in  very  realistic  contexts  by  presenting  items  as 
audiovisual  sequences.  On  the  psychometric  front,  models  that  go  beyond 
the  classification  of  responses  into  ''correct"  and  "incorrect"  have  been 
formulated  (e.g.,  Andersen,  1977;  Bock,  1972;  Fischer,  1973;  Samejima, 
19b9;  1972;  Embretson,  Chapter  7,  this  volume;  Scheiblechner ,  Chapter  8, 
this  volume;  Andrich,  Chapter  9,  this  volume)  but  await  tests  that  make 
use  of  their  capacity.  Finally,  on  the  theoretical  side,  experimental 
psychologists  have  taken  seriously  Cronbach's  exhortation  (Cronbach, 
1957)  to  unite  experimental  and  differential  psychology.  As  a  result, 
there  have  been  serious  attempts  since  the  1960s  to  understand  test 
performance  in  the  light  of  substantive,  not  just  quantitative,  theories 
(e.g.,  Carroll,  1976;  Embretson,  1983;  Lansman,  Donaldson,  Hunt,  & 
Yantis,  1982).  In  short,  the  materials  are  there  not  only  to  improve 
current  practice  but  also  to  chart  new  courses. 

Computers  and  Test  Assembly 

In  the  1960s,  one  would  have  predicted  that  the  computer's  first 
inroad  into  test  design  would  be  in  assisting  with  the  test  development 
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process  rather  than  in  administering  tests.  As  just  shown,  however, 
test  administration  by  computer  is  becoming  a  reality,  by  contrast,  the 
possibilities  of  using  computers  for  test  assembly  and  test  creation 
have  hardly  been  exploited.  Before  speculating  on  how  test  assembly  and 
item  generation  can  benefit  from  the  integration  of  psychometrics ,  tech¬ 
nology,  and  psychological  theory,  1  first  review  the  state  of  the  art. 

A  key  problem  in  test  development  is  maintaining  a  large  item  pool 
from  which  items  may  be  drawn,  according  to  some  set  of  specifications, 
to  assemble  the  final  form.  For  the  most  part,  item  pools  are  kept  in 
filing  cabinets.  When  the  time  comes,  however,  to  assemble  another  form 
it  might  be  wise  to  sweep  the  floor,  because  often  the  test  assembler 
spreads  the  cards  on  the  floor  to  select  (through  an  as  yet  unpublished 
procedure)  a  set  of  items.  Typically,  the  items  in  the  pool  have  been 
pretested,  a  requirement  imposed  by  the  actuarial  nature  of  test 
development.  Of  course,  that  is  not  the  end  of  the  process.  Once  a 
tentative  set  of  items  has  been  chosen  it  goes  through  numerous  revision 
stages  in  which  some  items  are  deleted  and  still  others  added.  The 
criteria  for  reviewing  items  include  the  following: 

1.  Distribution  of  item  statistics  such  as 
difficulty  and  discrimination, 

2.  distribution  of  distractors, 

3.  lexical  overlap, 

4.  conceptual  overlap, 
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Some  of  these  criteria  involve  only  surface  character ist ics  ol  the 
items.  Were  it  not  tor  the  fact  that  items  are  usually  pretested,  the 
test  assembler  would  often  have  an  erroneous  idea  of  the  difficulty  and 
discrimination  of  the  item  (e.g.,  Bejar,  1983a).  It  is  in  this  sense 
that  current  test  design  is  an  actuarial  science.  Precisely  because 
tests  are  assembled  on  the  basis  of  surface  characteristics ,  the  process 
is  amenable  to  computerization.  Such  computerization  will  take  place  if 
for  no  other  reason  than  that  it  increases  productivity. 

For  example,  computers  can,  to  the  extent  that  the  item  pool 
permits,  assemble  a  form  to  meet  the  requirements  just  enumerated  while 
simultaneously  attempting  to  meet  some  psychometric  criterion,  such  as 
the  distribution  of  item  difficulty  and  discrimination.  The  ideal 
system  would  be  flexible  enough  to  accommodate  the  styles  of  different 
test  designers,  and  it  would  also  be  interactive.  For  example,  the 
system  should  present  the  test  designer  with  the  option  of  either 
letting  the  computer  suggest  a  form  or  allowing  the  test  designer  to 
assemble  a  form  gradually.  In  either  case  the  system  should  be 
interactive  in  the  sense  of  allowing  the  test  designer  to  ascertain  how 
well  the  design  goals  have  been  met  as  often  as  the  test  designer 
desires.  Naturally,  the  system  should  be  powerful  enough  to  access 
sizable  item  pools  instantly,  regardless  of  their  graphic  complexity. 

Components  of  some  of  these  ideas  are  being  contemplated  or  in  some 
cases  have  been  implemented  (e.g..  Yen,  1983),  but  clearly  there  is  room 
for  improvement.  For  example,  while  the  computer  is  in  the  process  of 
selecting  a  set  of  items  it  may  easily  produce  a  report  on  the 
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availability  of  different  item  types  for  the  test  designer — who  in 
turn,  could  take  the  necessary  steps  to  replenish  the  item  pool 
following  the  suggestions  of  the  computer.  It  is  at  this  point, 
however,  that  the  actuarial  nature  of  current  test  design  makes  itself 
obvious.  if,  for  example,  the  computer  reports  that  easy  items  of  a 
certain  category  are  running  out  the  test  designer  can,  at  best,  make 
the  arrangements  to  pretest  another  batch  of  items  and  hope  that  among 
them  there  will  be  a  large  number  of  easy  items. 

A  system  to  implement  these  ideas,  to  my  knowledge,  has  neithi 
been  developed  nor  is  it  under  serious  consideration.  It  is,  howe 
question  ot  time  before  the  economics  of  the  present  labor-intensive 
approach  becomes  unbearable.  Because  substantial  planning  is  required 
to  develop  such  a  system,  it  would  be  desirable  to  begin  now  before  the 
need  becomes  urgent. 


Using  the  Computer  to  Generate  Items 


Computers  can  be  useful  for  test  design  because  they  can  advise  the 
test  designer  about  the  characteristics  of  unpretested  items  and, 
ultimately,  to  generate  items  according  to  a  prescription.  These 
activities  would,  of  course,  be  much  more  difficult  to  achieve; 
moreover,  it  would  make  the  system  described  earlier  unnecessary  because 
in  generating  items  the  computer  would  make  sure  that  they  meet  the 
required  specifications.  That  is,  rather  than  maintaining  large  item 
pools,  as  is  now  done,  a  point  may  be  reached  where  submitting  a 


prescription  for  a  test  to  a  computer  that  would  produce  a  test  meeting 


all  the  content  and  psychometric  specifications  would  be  feasible.  Are 
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we  anywhere  near  the  point  where  such  teats  are  possible?  A  brief 
review  of  the  state  of  the  art  is  very  much  in  order  at  this  juncture. 

The  essence  of  the  item  generation  process  as  it  is  currently 
practiced  was  described  by  Wesman  (1971): 

item  writing  is  essentially  creative — it  is  an  art.  Just  as 
there  can  be  no  set  of  formulas  for  producing  a  good  story 
or  a  good  painting,  so  there  can  be  no  set  of  rules  that 
guarantees  the  production  of  good  test  items.  Principles 
can  be  established  and  suggestions  offered,  but  it  is  the 
writer's  judgement  in  the  application — and  occasional 
disregard — of  these  principles  and  suggestions  that  determines 
whether  good  items  or  mediocre  ones  are  produced.  Each  item, 
as  it  is  being  written,  presents  new  problems  and  new 
opportunities.  Thus  item  writing  requires  an  uncommon 
combination  of  special  abilities  and  is  mastered  only  through 
extensive  and  critically  supervised  practice,  (p.  81) 

Chances  are  good  that  the  state  of  affairs  described  by  Wesman 
will  prevail  in  the  immediate  future.  However,  some  efforts  (e.g.,  Roid 
&  Haladyna,  1982)  are  under  way  to  make  item  writing  more  a  science  than 
an  art.  However,  the  foundation  of  many  of  the  procedures  outlined  in 
the  Roid  and  Haladyna  work  rest  on  a  behaviorist  foundation,  which  may 
make  them  incompatible  with  the  cognitive  turn  that  psychology  and 
psychometrics  have  taken.  For  example,  one  item  generation  technique 
that  has  evolved  is  the  item  form  (Hively,  1974).  Hively  defined  an 
item  form  as  a  list  of  rules  for  generating  a  set  of  items.  An  item  in 


turn  is  defined  as  a  "set  of  instructions  telling  how  to  evoke,  detect 
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and  score  a  specific  bit  of  human  performance.  It  must  include  the 
directions  for  (1)  presenting  the  stimuli,  (2)  recording  the  response, 
and  (3)  deciding  whether  or  not  the  response  is  appropriate"  (Hively, 
1974,  p.  b). 

From  a  psychometric  and  technological  standpoint,  item  forms  are 
attractive.  They  are  congenial  test  development  procedures  for 
psychometric  models  relying  on  the  assumption  that  the  items  in  a  test 
are  a  random  sample  from  some  universe  of  item.  Generalizability  theory 
(Brennan,  19d3;  Cronbac.h,  Gleser,  Nanda,  &  Rajaratnam,  1972)  is  the  most 
prominent  model  based  on  that  assumption.  From  a  technological  point  of 
view,  item  forms  are  also  attractive  because  they  permit  a  computer  to 
generate  items.  That  is,  the  item  form  can  be  viewed  as  a  program  that 
in  principle  can  enumerate  all  the  items  that  belong  to  the  universe.  By 
the  random  choice  of  items  from  this  universe  a  test  can  be  formed  that 
satisfies  the  random  sampling  assumption.  Although  item  forms  and 
generalizability  theory  are  very  compatible,  the  psychometrics  of 
behavioristically  oriented  test  design  has  often  taken  the  form  of  very 
specific  models  (e.g.,  Harris,  Pastorok,  &  Wilcox,  1977)  rather  than  the 
broader  foundation  provided  by  generalizability  theory. 

In  short,  the  closest  that  has  been  come  to  using  computers  for 
item  generation  is  through  the  notion  of  an  item  form  from  which  a 
universe  of  items  can  be  generated.  In  my  estimation,  that  approach  to 
item  generation  is  too  specialized.  In  practice,  items  differ  with 
respect  to  a  number  of  characteristics,  and  a  useful  generation  scheme 
must  have  control  over  those  characteristics.  For  example,  a  useful 
generation  scheme  should  be  able  to  generate  easy  items  or  hard  items  at 
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will.  1  suspect  that  to  build  such  systems  it  is  first  necessary  to 
have  an  idea  of  what  makes  an  item  easy  c  hard.  Some  insights  on 
beginning  to  do  this  can  be  found  in  efforts  concerned  with  the 
development  of  computer  programs  that  take  tests  (see,  e.g.,  Evans, 

1968;  Green,  1964;  Simon  &  Siklossy,  1972). 

Cognitive  Science  and  Psychometrics 

A  quick  review  of  the  history  of  psychology  (e.g.,  boring,  1950) 
shows  that  throughout  the  history  there  has  been  a  tension  between  the 
study  of  consciousness  and  the  study  of  behavior.  As  Boring  put  it,  "in 
its  simplest  terms  the  basic  problem  about  the  data  of  psychology  is 
this:  Does  psychology  deal  with  the  data  of  consciousness  or  data  of 
behavior  or  both?"  (Boring,  1950,  p.  620).  These  tensions  between 
opposing  views  often  manifest  themselves  in  psychology,  as  well  as  in 
other  sciences,  in  the  form  of  dichotomies  (Newell,  1983).  Within 
psychology  behaviorism  once  dominated  the  field.  The  pendulum  has  now 
swung  and  mentalism,  in  the  form  of  cognitive  psychology,  now  has  the 
upper  hand.  It  seems  that  psychometrics  has  swung  along  with  the  rest 
of  psychology,  as  evidenced  by  the  vigor  of  efforts  to  cognitivize 
psychometrics.  Some  of  these  efforts  are  represented  in  this  volume. 
(The  reader  is  also  referred  to  Embretson,  1983,  for  an  approach  that 
encompasses  not  only  test  design,  which  she  calls  construct 
representation ,  but  also  an  accounting  of  the  relationship  among  scores 
from  several  tests,  which  she  calls  nomothetic  span.) 

It  is  not  necessary  to  feel  sorry  for  the  behaviorist.  When 
behaviorism  was  champion,  psychoraetricians  of  that  persuasion  had  their 
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day,  as  demonstrated  by  the  following  excerpt  from  Osburn  (1968) 
regarding  test  design. 

Few  measurement  specialists  would  quarrel  with  the 
premise  that  the  fundamental  objective  of  achievement  testing 
is  generalization.  Yet  the  fact  is  that  current  procedures 
for  the  construction  of  achievement  tests  do  not  provide  an 
unambiguous  basis  for  generalization  to  a  well  defined 
universe  of  content.  At  worst,  achievement  tests  consist  ot 
arbitrary  collections  of  items  thrown  together  in  a  haphazard 
manner.  At  best,  such  tests  consist  of  items  judged  by 
subject  matter  experts  to  be  relevant  to  and  representative  of 
some  incompletely  defined  universe  of  content.  In  neither 
case  can  it  be  said  that  there  is  an  unambiguous  basis  for 
generalization.  This  is  because  the  method  of  generating 
items  and  the  criteria  for  the  inclusion  of  items  in  the  test 
cannot  be  stated  in  operational  terms. 

The  time-honored  way  out  of  this  dilemma  has  been  to 
resort  to  statistical  and  mathematical  strategies  in  an 
attempt  to  generalize  beyond  the  arbitrary  collection  of  items 
in  the  test.  By  far  the  most  popular  of  these  strategies  has 
been  to  invoke  the  concept  of  a  latent  variable — an  underlying 
continuum  which  represents  a  hypothetical  dimension  of  skill. 

(p.  95) 

The  notion  of  criterion-referenced  tests  was  popularized  by  Glaser 
and  Nitko  (1971)  shortly  thereafter,  and  for  over  a  decade  criterion- 
referenced  tests  enjoyed  the  endorsement  of  many  psychometricians  and 
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clearly  had  an  impact  on  test  design  (see  Shoemaker,  1975).  it  is 
perhaps  no  coincidence  that  critics,  once  behaviorism  ceased  to  be  a 
major  influence  in  psychology,  began  finding  all  sorts  of  problems  in 
criterion-referenced  tests.  For  example,  Johnson  and  Pearson  (1975) 
criticized  criterion-ref erenced  reading  tests  as  being  linguistically 
naive.  They  argued  that  by  focusing  exclusively  on  observable 
interpretations  the  usefulness  of  measuring  instruments  is  diminished. 
Moreover,  advocates  of  cri terion-ref erenced  measurement  (e.g., 

Hambleton,  Swaminathan,  Algina,  &  Coulson,  1978;  Nitko,  1980)  have  begun 
to  accept  construct  validation  as  playing  a  useful  role  in  the 
validation  of  criterion-referenced  tests.  This  of  course  implies  their 
acceptance  of  the  legitimacy  of  using  nonobservable  constructs  in  test 
interpretation.  Indeed,  there  is  no  reason  why  an  emphasis  on  behavior 
and  cognition  cannot  coexist  in  both  an  instructional  and  a  psychometric 
sense  (Greeno,  1978). 

The  more  recent  emphasis  on  cognitive  psychology  has  at  least  two 
implications  for  psychometrics.  One  is  the  possibility  of  understanding 
test  performance  in  terms  of  cognitive  constructs  (e.g.,  Sternberg, 
1981).  The  other  possibility  is  the  exploitation  of  cognitive  theory 
for  the  improvement  and  design  of  both  currrent  and  fundamentally  new 
tests.  In  the  next  section  I  discuss  both  possibilities. 

Validation  of  Test  Performance 

The  most  likely  immediate  influence  of  cognitive  science  on 
psychometrics  is  as  a  source  of  constructs  to  validate  test  scores. 
Messick  (1975)  has  eloquently  argued  for  the  necessity  of  construct 
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validation,  and  the  case  need  not  be  repeated  here.  It  is  sufficient  to 
say  that  the  availability  of  cognitive  or  information-processing 
constructs  and  the  revival  of  construct  validation  have  important 
implications  for  test  design. 

The  validation  of  both  aptitude  and  achievement  tests  has  relied 
very  little  on  cognitive  constructs.  In  the  recent  past  validation  of 
achievement  tests  was  strongly  influenced  by  content  considerations. 

This  was  in  line  with  the  behavioristic  orientation  of  criterion- 
referenced  testing  that  has  dominated  much  of  the  thinking  in  the  field. 
Similarly,  the  validation  of  aptitude  tests,  from  the  Scholastic 
Aptitude  Test  (SAT)  to  the  Armed  Services  Vocational  and  Aptitude 
Battery  (ASVAB)  has  relied  almost  exclusively  on  predictive  validity, 
and  this  paradigm  is  responsible  for  the  psychometric  nature  of 
procedures  for  improving  validity.  The  alternative  view  is  that 
understanding  the  nature  of  the  relationship,  as  opposed  to  just  its 
magnitude,  puts  test  developers  in  a  better  position  to  increase 
validity.  However,  validation  based  on  cognitive  constructs,  and  for 
that  matter  tests  developed  from  scratch  based  on  cognitive  theory,  need 
not  necessarily  yield  higher  predictive  validities.  It  is  known  from 
psychometric  theory  that  the  magnitude  of  correlation  between  a  test  and 
a  criterion  is  determined  by  the  proportion  of  variance  in  common 
between  the  two.  Clearly,  the  test  designer  has  control  over  the 
composition  of  the  test  but  not  over  the  composition  of  the  criterion. 
Hunt  (1983)  anticipated  this  when  he  noted  the  following: 

The  cognitive  science  view  may  lead  to  the  development  of 

new  tests  that  are  more  firmly  linked  to  a  theory  of  cognition 


than  are  present  tests.  Such  tests  are  yet  to  be  written 
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There  is  no  compelling  reason  to  believe  that  new  tests  will 
be  better  predictors  ot  those  criteria  that  are  predicted  by 
today's  tests.  After  all  the  present  tests  are  the  results  of 
an  extensive  searcli  for  instruments  that  meet  the  pragmatic 
criterion  of  prediction.  Theoretically  based  tests  may  expand 
the  range  of  cognitive  functions  that  are  evaluated  and 
certainly  should  make  better  contact  with  our  theories  of 
cognition.  Theoretical  interpretation,  alone,  is  not  a 
sufficient  reason  for  using  a  test.  A  test  that  is  used  to 
make  social  decisions  must  meet  traditional  psychometric 
criteria  for  reliability  and  validity  [italics  added].  No 
small  effort  will  be  required  to  construct  tests  that  meet 
both  theoretical  and  pragmatic  standards.  The  effort  is 
justified,  for  our  methods  of  assessing  cognition  ought  to 
flow  from  our  theories  about  the  process  of  thinking,  (p.  14b) 
Moreover,  from  a  social  perspective,  validation  solely  in  terms  of 
predictive  validity  is  inadequate.  A  predictive  validation  strategy  may 
have  been  appropriate  when  the  primary  object  of  testing  was  the 
identification  of  high-scoring  individuals,  but  society's  concern  with 
equality  requires  a  focus  on  low-scoring  individuals  also.  As  noted  by 
the  Committee  on  Ability  Testing  of  the  National  Research  Council: 

The  relationship  between  problem  solving  on  tests  and  everyday 
performance  has  taken  on  new  relevance  to  public  policy,  as 
attention  has  come  to  focus.  .  .  not  on  those  selected,  as  was 


the  case  when  tests  were  perceived  primarily  as  identifying 
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excellence,  but  on  those  not  selected.  This  shift  in  focus  has 
brought  new  prominence  to  the  question  of  what  is  being 
measured  by  a  given  test  or  item  type  and  has  pointed  up 
insufficiencies  from  a  public  perspective  in  validation 
strategies  based  solely  on  the  demonstration  of  external 
statistical  relationships.  (Wigdor  &  Warner,  1982,  p.  215) 

This  quotation  and  much  of  the  litigation  involving  tests  suggest 
that  in  the  years  ahead  test  designers  will  have  to  be  more  sensitive  to 
the  ethical  implications  of  testing  instruments.  That  is,  test 
designers  will  have  to  take  into  account  not  just  the  psychometric  and 
substantive  base  of  tests  but  their  consequences  as  well.  Messick 
(1980)  has  suggested  that  the  consequences  of  testing  should  be  a 
component  of  the  validation  process  rather  than  an  afterthought.  Just 
as  construct  validation  consists  of  collecting  evidence  from  many 
substantive  perspectives,  the  procedure  for  incorporating  consequences 
into  the  validation  process  consists  of  collecting  Information  on  the 
implications  of  using  a  test  in  a  particular  situation.  However,  such 
listing  of  implications  cannot  be  fruitfully  done  in  a  psychometric 
vacuum: 

Appraising  the  possible  consequences  of  test  use  is  not 
a  trivial  process  under  any  circumstances,  but  it  is 
virtually  impossible  in  the  absence  of  construct  validity 
information  about  the  meaning  and  nature  of  test  scores. 

Just  as  the  construct  network  of  nomological  implications 
provided  a  rational  basis  for  hypothesizing  potential 
relationships  to  criteria,  so  it  also  provides  a  rational 
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basis  for  hypothesizing  potential  outcomes  and  for 
anticipating  possible  side  effects.  (Messick,  1980,  p.  15) 


An  Illustration 

An  example  of  construct  validation  in  the  context  of  an  adaptive 
test  is  provided  by  Bejar  and  Weiss  (1978).  They  postulated  a 
nomological  net  to  account  for  achievement  in  a  college  biology  course 
and  proceeded  to  test  its  feasibility  with  a  structural  equation  model 
(see  Bentler,  1978,  for  a  discussion  of  construct  validation  by  means  of 
structural  equation  models).  The  net  is  seen  in  Fig.  1.  The  rectangles 
represent  the  constructs  postulated  to  account  for  the  relationships 
among  the  six  observable  variables.  The  coefficients  next  to  the  arrows 
are  those  that  need  to  be  estimated.  The  direction  of  the  arrow 
indicates  that  the  variable  at  the  head  of  the  arrow  is  regressed  on  the 
variable  at  the  other  end  of  the  arrow.  Bejar  and  Weiss  concluded  that 
the  postulated  model  indeed  fitted  the  data  and  that  although  there  were 
no  major  differences  between  the  validity  of  paper-and-pencil  and 
adaptive  versions  of  the  test,  the  adaptive  test  required  25%  fewer 
items.  At  the  time,  such  a  reduction  in  the  number  of  items,  compared 
to  the  cost  of  an  adaptive  test,  may  not  have  been  cost-effective.  By 
the  1980s,  of  course,  the  hardware  cost  per  terminal  could  have  easily 
been  less  than  $1,000,  and  the  economics  of  adaptive  testing  may  thus 
appear  more  attractive. 

(Insert  Figure  1  about  here) 

The  net  postulated  by  Bejar  and  Weiss  was  dictated  more  by  the 
availability  of  scores  than  by  an  information-processing  model  of 
achievement.  If  measures  inspired  by  cognitive  science  had  been 


available  they  couLd  easily  have  been  used.  Rose  (1980),  for  example, 
developed  a  battery  of  tasks  that  are  indicators  of  information 
processes.  The  use  of  that  battery  in  the  validation  of  the  adaptive 
biology  achievement  test  would  have  been  consistent  both  with  what  has 
been  called  a  cognitive  correlates  and  cognitive  components  approach  to 
cognitive  psychometrics  (see  Pellegrino  &  Glaser,  1979;  Sternberg, 

1981).  In  the  cognitive  correlates  approach,  the  goal  is  to  test 
subjects  on  several  low-level  tasks  that  are  believed  to  be  indicative 
of  the  subjects'  efficiency  in  processing  information.  An  example  of  a 
low-level  task  is  matching  whether  two  letters,  such  as  C c_,  constitute  a 
physical  match  or,  as  in  this  case,  a  name  match.  Because  the  tasks  are 
easy,  response  latency,  rather  than  correctness,  is  the  outcome  of 
interest  on  such  tasks.  In  a  cognitive  components  approach,  the  aim  is 
to  postulate  a  model  of  information  processing  and  to  test  by  obtaining 
data  on  the  performance  of  subjects  on  testlike  tasks.  The  outcomes 
from  either  approach  can  be  used  as  part  of  a  construct  validation  study 
designed  to  gain  further  understanding  of  the  performance  of  students  in 
a  test. 

Although  it  is  beyond  the  scope  of  this  chapter  to  consider 
achievement  testing  in  detail  (see  Bejar,  1983b),  it  should  be  noted 
that  performance  on  an  achievement  test  depends  on  both  processing 
components  and  the  storing  of  information.  The  cognitive  components  and 
correlates  approach  emphasizes  the  processing  part  but  not  the  storage 
part,  that  is,  the  schema  for  representing  information.  Other 
researchers  (e.g.,  Burton,  1982)  have  emphasized  the  storage  part  by 
elaborating  constructs  about  how  the  students  represent  knowledge. 

The  Bejar  and  Weiss  (1978)  study,  in  addition  to  illustrating  what 
Messick  (1980)  has  called  evidential  component  of  validation,  also 
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illustrated  the  consequential  aspect  ot  validation.  Bejar  and  Weiss 
found  evidence  in  their  data  of  a  medium  effect.  That  is,  it  seemed 
that  the  medium  of  administration,  whether  it  was  paper  and  pencil  or 
computerization,  influenced  the  scores  to  some  extent.  If  this  medium 
effect  can  be  replicated,  the  possible  ethical  consequences  will  be 
something  for  future  test  designers  to  worry  about.  For  example, 
students  from  less-affluent  homes  are  less  likely  to  have  been  exposed 
to  keyboards  and  CRTs  and  thus  may  obtain  lower  scores.  Because  these 
students  are  also  likely  to  have  been  exposed  to  a  less-adequate 
educational  environment,  it  would  add  insult  to  injury  to  test  those 
students  by  computer  without  first  ensuring  that  they  are  at  ease  with 
the  computer  as  a  medium  for  test  delivery.  The  work  of  Snow  and 
Peterson  (Chapter  5  in  this  volume)  has  obvious  implications  for 
research  on  the  detection  of  such  problems. 

Towards  Scientifically  Based  Test  Design 

In  the  previous  section  I  discussed  construct  validation  as 
a  means  to  a  better  understanding  of  test  scores.  Although,  no  doubt, 
such  information  could  be  useful  to  a  test  developer,  he  or  she  may  be 
at  a  loss  on  how  to  incorporate  that  information  in  the  creation  of  new 
items  or  of  entirely  new  tests.  In  this  section  I  argue  that  from  a 
test  design  perspective  it  is  necessary  to  shift  the  focus  of  attention 
from  the  examinee  to  the  item.  That  is,  just  as  construct  validation  of 
test  scores  entails  research  to  understand  differences  among  examinees, 
construct  validation  applied  to  test  design  entails  research  to 
understand  differences  among  items.  More  concretely,  it  is  necessary  to 


A 


The  Future  of  Test  Design 
19 

account  for  the  differences  among  items  with  respect  to  their 
characteristics,  especially  difficulty.  1  suggest  that  cognitive 
science  is  an  important  source  of  ideas  for  accomplishing  that  goal. 

This  integration  of  psychometric  models  and  cognitive  science,  as 
reflected  in  the  work  of  Embretson  (1983)  and  Fischer  (1973),  is 
important  not  oniy  for  advancing  the  scientific  status  of  psychometric 
instruments  but  also  for  creatively  incorporating  technological  advances 
into  the  testing  process.  For  example,  if  test  developers  are  able  to 
account  for  differences  among  items  they  may  have  captured  the  knowledge 
necessary  to  synthesize  items  of  known  characteristics  (see  Egan,  1979). 
They  may,  in  short,  be  able  to  write  a  computer  program  capable  of 
composing  an  item  with  known  psychometric  characteristics .  The  chapters 
by  Sternberg  and  McNamara  (Chapter  2),  Pellegrino,  Mumaw,  and  Cantoni 
(Chapter  3),  and  Butterfield,  Nielsen,  Tangen,  and  Richardson  (Chapter 
4)  in  this  volume  provide  a  basis  for  research  toward  that  goal. 

I  would  be  the  first  to  agree  that  synthesizing  items  is  not  likely 
to  be  easy  and  that  sustained  research  is  required  before  practical 
results  will  be  available.  Nevertheless,  adopting  that  effort  as  a  goal 
puts  test  developers  in  the  enviable  position  of  simultaneously  pursuing 
scientific  and  economic  goals.  That  is,  the  ability  to  synthesize  items 
is  likely  to  improve  the  productivity  of  the  test  designer  in  much  the 
same  way  that  computers  have  altered  the  productivity  of,  for  example, 
graphics  designers  in  various  industries.  To  reach  that  point,  however, 
they  will  have  to  do  considerable  work  to  establish  and  validate  a 
theory  that  explains  the  characteristics  of  items. 

It  is  beyond  the  scope  of  this  chapter  to  outline  a  detailed 
research  program  that  will  in  the  end  allow  the  synthesis  of  items. 
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However,  a  natural  starting  point  is  to  account  for  the  variability 
among  existing  items.  (See  Carroll,  1979  for  an  attempt  to  do  so.) 
Unfortunately,  this  task  is  made  difficult  by  the  fact  that  most 
existing  tests  are  of  the  multiple-choice  variety.  No  doubt  with  such 
items  the  context  in  which  the  correct  alternative  occurs  partially 
determines  the  psychometric  characteristics  of  the  item.  This, 
unfortunately,  makes  the  task  more  difficult  than  it  ought  to  be  because 
the  multiple-choice  item  was  invented  to  facilitate  group  testing,  and 
thus  its  usefulness  will  presumably  diminish  as  computers  are  used  more 
and  more  in  the  administration  of  individualized  tests.  In  the 
meantime,  however,  test  designers  must  be  ready  to  deal  with  the 
complications  introduced  by  multiple-choice  items. 

Psycholinguistic  theory  is  a  rich  source  of  hypotheses  for  the 
study  of  verbal  tests  such  as  reading  comprehension  tests  and  writing 
ability  tests.  Psychologists  have  devoted  considerable  attention  to 
sentence  comprehension  (e.g.,  Kintsch,  1977).  One  early  theory  was 
postulated  by  Miller  (Miller  &  McKeon,  1964)  and  is  known  as  the 
Derivational  Complexity  Theory.  According  to  this  theory  the 
comprehensibility  of  a  sentence  is  determined  by  the  syntactic 
complexity  of  the  sentence.  Complexity  was  measured  as  the  number  of 
transformations  required  to  go  from  the  deep  structure  to  the  surface 
structure  of  a  sentence.  Although  this  particular  theory  is  not  now 
well  supported,  it  seems  reasonable  to  suggest  that  if  comprehensibility 
of  a  sentence  is  affected  by  some  measure  of  syntactic  and  semantic 
complexity  then  psychometric  difficulty  of  an  item  based  on  that 
sentence  will  to  some  extent  also  depend  on  the  syntactic  and  semantic 
complexity  of  the  sentence. 
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A  test  with  items  based  on  sentences  is  the  Test  ot  Standard 
Written  English  (TSWE),  sponsored  by  the  College  Board  and  produced  by 
the  Educational  Testing  Service.  One  of  the  two  item  types  in  the  test 
consists  of  a  sentence  that  may  or  may  not  contain  a  grammatical  error. 
The  examinee's  task  is  to  determine  whether  the  sentence  as  it  stands 
contains  an  error;  if  it  does,  the  examinee  must  select  from  several 
alternatives  to  correct  the  sentence.  One  way  to  apply  these  ideas  to 
the  TSWE  is  to  obtain  several  measures  of  linguistic  complexity  on  each 
item  and  study  the  relationship  of  those  measures  to  psychometric 
difficulty.  If  a  stable  relationship  is  found,  then,  in  principle,  the 
resulting  model  may  be  used  to  predict  the  difficulties  of  new  items  and 
even  to  modify  items  so  they  will  be  easier  or  harder. 

Although  the  preceding  remarks  are  speculative,  some  research  along 
these  lines  already  exists.  For  example,  the  Degrees  of  Reading  Power 
(DRP)  sponsored  by  the  College  Board,  is  a  reading  comprehension  cloze 
test.  Unlike  the  usual  cloze  test,  the  DRP  is  a  multiple-choice  test; 
that  is,  the  examinee  is  provided  several  choices  for  filling  in  the 
deleted  word.  The  difficulty  of  those  items  can  apparently  be  predicted 
on  the  basis  of  the  readability  index  of  the  passage.  Similarly, 

Swinton  (personal  communication)  has  experimented  with  verbal  analogy 
items  by  forming  different  versions  of  the  item  in  order  to  alter  their 
difficulty. 

The  idea  of  synthesizing  items  of  known  characteristics  has  been 
implemented  by  at  least  one  research  team  (Burton,  1982).  They  were 
concerned  with  the  design  of  diagnostic  tests  of  subtraction.  Their 
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goal  was  to  infer  what  misconceptions  may  account  for  a  student's  error 
in  arithmetic.  To  speed  that  process  up  it  is  necessary  to  synthesize 
items  "on  the  fly"  that  are  most  informative  with  respect  to  the  current 
set  of  hypothesized  misconceptions.  That  is,  a  computer  creates  the 
items  as  they  are  needed  rather  than  retrieving  them  from  a  pool. 

One  area  that  seems  ready  for  the  integration  of  cognitive  theory 
and  psychometric  models  is  spatial  ability.  Spatial  ability  has  been  a 
subject  of  intense  investigation.  A  well-established  finding  is  that 
the  response  latency  to  problems  that  require  mental  manipulation  is  a 
function  of  the  physical  characteristics  of  the  test  stimuli.  For 
example,  the  time  it  takes  to  determine  whether  two  geometric  figures 
are  the  same  is  a  linear  function  of  their  angular  disparity  (see 
Cooper,  1980).  This  finding  suggests  that  the  psychometric  difficulty 
of  spatial  items  could  be  predicted  from  an  analysis  of  their  physical 
characteristics.  A  project  investigating  this  possibility  is  under  way 
at  Educational  Testing  Service  under  the  sponsorship  of  the  Office  of 
Naval  Research. 

Concluding  Comments 

In  this  chapter  I  have  attempted  to  enumerate  some  of  the  ways  in 
which  the  integration  of  technology,  cognitive  science,  and  psychometric 
theory  can  benefit  test  design.  The  state  of  the  art  is  most  advanced 
with  respect  to  the  administration  of  tests,  with  the  notion  of  adaptive 
tests  rapidly  approaching  operational  implementation.  As  I  have 


suggested,  adaptive  testing  is  a  significant  step  forward.  However, 
from  a  user's  point  of  view,  an  adaptive  test  is  just  a  multiple-choice 
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test  administered  by  computer  because  the  improvements  in  etticiency 
and  even  the  test's  psychological  advantages  are  not  obvious  to  the 
naked  eye. 

I  have  argued  that  to  move  the  state  of  the  art  forward  it  will  be 
necessary  to  pay  closer  attention  to  the  psychological  foundation  of 
tests.  This  effort  calls,  on  one  hand,  for  the  construct  validation  of 
tests  from  both  an  evidential  and  consequential  perspective.  On  the 
other  hand,  I  have  also  argued  that  to  improve  the  scientific  basis  of 
test  design  it  is  necessary  to  focus  attention  not  only  on  variability 
among  examinees  but  also  on  variability  among  items.  In  particular,  a 
better  understanding  of  why  items  behave  the  way  they  do  is  needed. 

From  a  practical  perspective  the  payoff  for  doing  so  will  be  the 
possibility  of  ultimately  being  able  to  synthesize  items  of  known 
psychometric  characteristics. 
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Learning  Research  4  Development  Center 
University  of  Pittsburgh 
3939  O’Hara  Street 
PITTSBURGH.  PA  15260 

1  Dr.  Bert  Green 
Johns  Hopkins  University 
Department  of  Psychology 
Charles  4  34th  Street 
Baltnore,  HD  21218 

1  DR.  JAHES  6.  GREENC 
LRDC 

UNIVERSITY  OF  PITTSBURGH 
3939  O’HARA  STREET 
PITTSBURGH,  PA  15213 

1  Dr.  Ron  Hanbleton 
School  of  Education 
University  of  Massachusetts 
Amherst,  HA  01002 

1  Dr.  Delwyn  Harnisch 
University  of  Illinois 
242b  Education 
Urbana,  IL  61801 

1  Dr.  Paul  Horst 
677  E  Street.  #JB4 
Chula  Vista,  CA  90010 

1  Glenda  Greenwald,  Ed. 

Hu»an  Intelligence  Newsletter 
P.  0.  Bos:  1163 
Birmingham,  HI  48012 

1  Dr.  Llovd  Humphreys 
Department  of  Psychology 
University  of  Illinois 
603  East  Daniel  Street 
Champaign,  IL  61820 


1  Dr,  Steven  Hunka 
Department  of  Education 
University  of  Alberta 
Edmonton,  Alberta 
CANADA 

1  Dr.  Earl  Hunt 
Dept,  of  Psychology 
University  of  Kashi ngton 
Seattle,  HA  98105 

1  Dr.  Jack  Hunter 
2122  Coolidge  St. 

Lansing,  HI  4B9G6 

1  Dr,  Huynh  Huynh 
College  of  Education 
University  of  South  Carolina 
Columbia,  SC  29208 

1  Dr.  Douglas  H.  Jones 
Advanced  Statistical  Technologies 
Corporation 
10  Trafalgar  Court 
Lawrenceville,  NJ  083 AB 

1  Dr.  Harcel  Just 
Department  of  Psychology 
Carnegie-Hellon  University 
Pittsburgh,  PA  15213 

1  Pro^ssor  John  A.  Keats 
Department  of  Psychology 
The  University  of  Newcastle 
N.S.N.  2308 
AUSTRALIA 

1  CDR  Robert  S.  Kennedy 
Canyon  Research  Broup 
1040  Kcodcock  Road 
Suite  227 
Orlando,  FL  32803 

1  Dr.  Killian  Koch 
University  of  Tesas-Austin 
Heasurement  and  Evaluation  Center 
Austin,  T*  78703 

1  Dr.  Stephen  Kosslyn 
1236  Killiai  James  Hall 
33  Kirkland  St. 

Cambridge,  HA  02138 
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1  Cr.  Harcy  Lar.saan 
The  L.  L.  Thurstone  Psychometric 
Laboratory 

University  of  North  Carolina 
Davie  Hall  013A 
Chapel  Hill,  NC  27514 

1  Dr.  Jill  Larkin 
Departaent  of  Psychology 
Carnegie  Mellon  University 
Pittsburgh,  PA  15213 

1  Dr.  Alan  Lesgold 
Learning  RLC  Center 
University  of  Pittsburgh 
3939  O’Hara  Street 
Pittsburgh,  PA  15260 

1  Dr.  Michael  Levine 
Departaent  of  Educational  Psychology 
21C  Education  Bldg. 

University  of  Illinois 
Chaapaign,  IL  61801 

1  Dr.  Charles  Lewis 
Faculteit  Social e  betenschappen 
Ri j ksuni vers: teit  Groningen 
Oude  Boterirgestraat  23 
9712GC  Groningen 
Netherlands 

1  Dr.  Robert  Linn 
College  of  Education 
University  of  Illinois 
Urbana,  IL  61801 

1  Hr.  Phillip  Livingston 
Systeas  and  Applied  Sciences  Carporatio 
6811  Kenilworth  Avenue 
Riverdale,  MD  20840 

1  Dr.  Robert  Lockean 
Center  for  Naval  Analysis 
200  North  Beauregard  St. 

Alexandria,  VA  22311 


1  Dr.  Jaees  Luasden 
Departaent  of  Psychology 
University  of  Mestern  Australia 
Nediands  M. A.  6009 
AUSTRALIA 


1  Dr.  Don  Lyon 
P.  0.  Box  44 
Higley  ,  AZ  B5236 

1  Dr.  Gary  Marco 
Stop  31 -E 

Educational  Testing  Service 
Princeton,  NJ  08451 

1  Dr.  Scott  Maxwell 
Departaent  of  Psychology 
University  of  Notre  Daae 
Notre  Daae,  IN  46556 

1  Dr.  Saauel  T.  Mayo 
Loyola  University  of  Chicago 
820  North  Michigan  Avenue 
Chicago,  IL  60611 

1  Mr.  Robert  McKinley 
Aeerican  College  Testing  Prograas 
P.0.  Bex  168 
Iowa  City,  IA  52243 


Professor  Jason  Mil  lean 
Departrent  of  Education 
Stone  Hall 
Cornell  University 
Ithaca,  NY  14853 

1  Dr.  Robert  Mislevy 
711  Illinois  Street 
Geneva,  IL  60134 


1  Dr.  Allen  Hunro 

Behavioral  Technology  Laboratories 
1845  Elena  Ave.,  Fourth  Floor 
Redond:  Beach,  CA  90277 

1  Dr.  N.  Alan  Nicewander 
University  of  Oklahoaa 
Departnent  of  Psychology 
Oklahoaa  City,  OK  73069 


1  Dr.  Frederic  M.  Lord 
Educational  Testing  Service 
Princeton,  NJ  08541 


ETS/Be i ar  (NR  150-531) 


7-Har -84 


Page 


Private  Sector  Private  Sector 


1  Dr,  Melvin  R.  Novick 
356  Lindquist  Center  tor  Measurment 
University  of  Iowa 
low  City,  JA  52242 

1  Dr.  Jaaes  Olson 
N1CAT,  Inc. 

1875  South  State  Street 
Orea,  UT  04057 

1  Dr.  Jesse  Orlansky 
Institute  tor  Detense  Analyses 
1801  N.  Beauregard  St. 

Alexandria,  VA  22311 

1  Nayne  H.  Patience 
Aaencan  Council  on  Education 
BED  Testing  Service,  Suite  20 
One  Dupont  Cirle,  NN 
Washington,  DC  20036 

1  Dr.  Jaaes  A.  Paulson 
Portland  State  University 
P.0.  Box  751 
Portland,  OR  97207 

1  Dr.  Jaaes  N.  Pellegrino 
University  ot  Calitornia, 

Santa  Barbara 
Dept,  of  Psychology 
Santa  Barabara  ,  CA  93106 

1  Dr.  Steven  E.  Poltrock 
Bell  Laboratories  2D-444 
600  Mountain  Ave. 

Murray  Hill,  NJ  07974 

1  Dr.  Mike  Posner 
Department  of  Psychology 
University  of  Oregon 
Eugene,  OR  97403 

1  Dr.  Mark  D.  Reckase 
ACT 

P.  0.  Box  168 
Ioaa  City,  !A  52243 

1  Dr.  Thoaas  Reynolds 
University  of  Texas-Dallas 
Marketing  Department 
P.  0.  Box  688 
Richardson,  TJ  75080 


1  Dr.  Andre*  M.  Rose 
American  Institutes  for  Research 
1055  Thomas  Jefferson  St.  NN 
Washington,  DC  20007 

1  Dr.  Laurence  Rudner 
403  Elm  Avenue 
Takoaa  Park,  HD  20012 

1  Dr.  J.  Ryan 
Department  of  Education 
University  of  South  Carolina 
Columbia,  SC  29208 

1  PROF.  FUHIKO  SFHEJIHA 
DEPT.  OF  PSYCHOLOGY 
UNIVERSITY  OF  TENNESSEE 
KNOXVILLE,  TN  37916 

1  Frank  L.  Schmidt 
Department  of  Psychology 
Bldg.  GS 

George  Washington  University 
Washington,  DC  20052 

1  Dr.  Walter  Schneider 
Psychology  Department 
603  E.  Daniel 
Champaign,  IL  61820 

1  Lowel 1  Schoer 
Psychological  6  Quantitative 
Foundations 
College  of  Education 
University  of  Iona 
Iona  City,  IA  52242 

1  Dr.  Kazuo  Shigeaasu 
7-9-24  Huger, uia-Haigan 
Fujusaxa  251 
JAPAN 

1  Dr.  Edmin  Shirkey 
Department  of  Psychology 
University  of  Central  Florida 
Orlando,  FL  32B16 

1  Dr.  William  Sims 
Center  for  Naval  Analysis 
200  North  Beauregard  Street 
Alexandria,  VA  22311 
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1  Dr.  H.  Wallace  Sinaiko 
Program  Director 

Manpower  Research  and  Advisory  Services 
Smithsonian  Institution 
801  North  Pitt  Street 
Alexandria,  VA  22314 

1  Dr.  Kathryn  T.  Spoehr 
Psychology  Department 
Brown  University 
Providence,  RI  02912 

1  Dr.  Robert  Sternberg 
Dept,  of  Psychology 
Yale  University 
Box  11A,  Yale  Station 
New  Haven,  CT  06520 

1  Martha  Stocking 
Educational  Testing  Service 
Princeton,  NJ  00541 

1  Dr.  Peter  Stoloff 
Center  for  Nava!  Analysis 
200  North  Beauregard  Street 
Alexandria,  VA  223!  1 

1  Dr.  Hilliaa  Stout 
University  of  Illinois 
Department  of  Mathematics 
Urbana,  IL  61801 

1  DR.  PATRICK  SUPPES 
INSTITUTE  FOR  MATHEMAT ICfiL  STUDIES  IN 
THE  SOCIAL  SCIENCES 
STANFORD  UNIVERSITY 
STANFGRD,  CA  943C5 

1  Dr.  Hariharan  Swammathan 
Laboratory  of  Psychometric  and 
Evaluation  Research 
School  of  Education 
University  of  Massachusetts 
Amherst,  HA  01003 

1  Dr.  Kikumi  Tatsucka 
Computer  Based  Education  Research  Lab 
252  Engineering  Research  Laboratory 
Urbana,  IL  61801 

1  Dr.  Maurice  Tatsuoka 
220  Education  Bldg 
1310  S.  Sixth  St. 

Champaign,  IL  61G20 
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1  Dr.  David  Thissen 
Department  of  Psychology 
University  of  Kansas 
Lawrence,  KS  66044 

1  Dr.  Douglas  Towne 
Umv.  of  So.  California 
Behavioral  Technology  Labs 
1845  S.  Elena  Ave. 

Redondo  Beach,  CA  90277 

1  Dr.  Robert  Tsutakawa 
Department  of  Statistics 
University  of  Missouri 
Columbia,  M0  65201 

1  Dr,  V.  R.  R.  Uppuluri 
Union  Carbide  Corporation 
Nuclear  Division 
P.  0.  Box  Y 
Dak  Ridge,  TN  37830 

1  Dr.  David  Vale 
Assessment  Systems  Corporation 
2233  University  Avenue 
Suite  310 

St.  Paul,  MN  55114 

1  Dr.  Kurt  Van  Lehn 
Xerox  FARC 

3333  Ccyote  Hill  Road 
Palo  Alto,  CA  94304 

1  Dr.  Howard  tlainer 
Division  of  Psychological  Studies 
Educational  Testing  Service 
Princeton,  NJ  08540 

1  Dr.  Michael  T.  Nailer 
Department  of  Educational  Psychology 
University  of  Nisconsin—Hilwaukee 
Milwaukee,  HI  53201 

1  Dr.  Brian  Haters 
HuaRRO 

300  North  Washington 
Alexandria,  VA  22314 

1  Dr.  David  J.  Weiss 
N660  Elliott  Hall 
University  of  Minnesota 
75  E.  River  Road 
Minneapolis,  MN  55455 
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1  Dr.  Donald  0.  Weitzman 
Mitre  Corporation 
1820  Dolley  Madison  Blvd 
McLean,  Vfl  22102 

1  Hilliaa  B.  Whitten 
Bell  Laboratories 
2D-610 

Holadel,  NJ  07733 

1  Dr.  Christopher  Nickens 
Department  of  Psychology 
University  of  Illinois 
Chaipaign,  1L  61820 

1  Dr.  Rand  R.  Wi 1  cox 
University  of  Southern  California 
Department  of  Psychology 
Los  Angeles,  CA  90007 

1  Wolfgang  Wildgrube 
Streitkraef teamt 
Bov  20  50  03 
D-5300  Bonn  2 
NEST  GERMANY 

1  Dr.  Bruce  Williams 
Department  of  Educational  Psychology 
University  of  Illinois 
Urbana,  IL  61801 

1  Dr.  Wendy  Yen 
CTB.'HcGra*  Hill 
Del  Monte  Research  Park 
Monterey,  CA  93940 


